Lab 1: Intro to R and data analysis
Website construction in progress…
Day 1
Topics
- Introduction to R and R-studio
- Why R?
- Understanding different types of variables
- Handling R objects: vectors, matrix, data.frame
- Descriptive statistics
- Measures of central tendency, measures of variability (or spread), and frequency distribution
- Visual data exploration
- {
ggplot2}
- {
- Foundations of inference
Lab code
Here you will find the solved problems addressed in Lab 1
- as
.qmdfile (commented solutions)
- as
.Rfile
Lab datasets
Below are the datasets used in the Practice session:
-
.csvfile
- local
subfolder
This workshop showcases introductory bio statistics concepts using the open source (and free!) programming language . Each session of the workshop features exercises that will help you learn by doing. Therefore, it is recommended that you pre-install the following on your machine.
Below is quick step by step process that can help you get started.
Install
R is available for free for Windows , GNU/Linux , and macOS .
- To install R, you can go to this link. The latest available release is R 4.3.3 “Angel Food Cake” released on 2024-02/29, but any (fairly recent) version will do.
If you have previously installed R on your machine, you can check which version you are running by executing this command in R:
# From the R console
base::R.version.string
# (This is the version on my own machine)
# [1] "R version 4.2.2 (2022-10-31)"…or by executing this command in your CLI (Command Line Interface):
# From Terminal/Powershell/bash
R --versionInstall RStudio IDE
While not strictly required, it is highly recommended that you also install RStudio to facilitate your work. RStudio Desktop is an Integrated Development Editor (IDE), basically a graphical interface wrapping and interfacing R (which needs to be installed first).
R, which is a command line driven program, can be executed via its native interface (R GUI), as well as from many other code editors, like VS Code, Sublime Text, Jupyter Notebook, etc. RStudio remains the most widely used by beginners and advanced programmers alike, because of its intuitive and integrated interface.
- To install RStudio you can go to this link. The free-version contains everything you need.

Install R packages from the CRAN
An R package is a shareable bundle of functions. Besides the basic built-in functions already contianed in the program (i.e. the base package), many useful R functions come in free libraries of code (or packages) written by R’s user community.
CRAN - the Comprehensive R Archive Network - is the general package repository for R: https://cran.r-project.org/.
Bioconductor -
Github -
Installing and using R packages
#? https://r-training.pages.uni.lu/biostat1/install_tutorial.html
Let’s take for example the R package corr, a package for graphically exploring correlations. To install it for the first time, open an R session and execute:
# Installing (ONLY the 1st time)
utils::install.packages('corrplot')Here you are actually using a function (install.packages) of a pre-installed package (utils) using the syntax packagename::function_name
Once you have installed a package, at every subsequent R session, you will only need to load it, like so:
To inquire about a package and/or its functions, you can again write in your console ?package_name or ?function_name and RStudio will open up a help page in the dedicated pane of Rstudio:
# Opening Help page on package/function
?corrplotYou can also install and update packages using the “Packages” tab on the lower righ pane of RStudio
Managing files and projects
In any analytical endeavor it is very likely that you will handle a collection of files (likely organized in folders, such as input_data, output_data, R_scripts, paper, etc.). R provides a fantastic tool for organizing all the files pertaining to a project called “R project”
Creating an R Project
Creating an R Project will keep all the files associated with a project (including invisible ones!) organized together – input data, R scripts, analytical results, figures. Besides being common practice, this has the advantage of implicitly setting the “working directory”, which is incredibly important when you need to load or output files, specifying their file path.
Defining (reproducible) file paths
It is never good practice to “hard code” the complete file path of a file: most likely this will break your code as soon as you (or someone else) need to run it on a different machine, let alone within a different OS.
# [NOT REPRODUCIBLE] hard coding your file path --------------------------
library(readr)
# File path on Mac:
dataset <- read_csv("/Users/testuser/R4biostats/input_data/dataset.csv")
# Same file path on Windows:
dataset <- read_csv("C:\Users\testuser\R4biostats\input_data\dataset.csv")This is where the fantastic here package comes in to help, as it will define file paths in a “reproducible manner” as long as you have created an R Project.
# [REPRODUCIBLE!!] Expressing your file path in a system agnostic way! --------
library(here)
library(readr)
# Check where is my Working Directory?
here::here()
# [1] "/Users/testuser/R4biostats"
# Then define file path as ("subfolder_name", "file_name")
# No "\" or "/" needed!
dataset <- read_csv(here("input_data", "dataset.csv"))The here package uses the top-level directory of a project (where you have placed your proj_name.Rproj) as the reference to easily, and portably, build paths to files.
#? Objects and functions
#? R packages that will be required for the workshop
To install an R package, open an R session execute:
# Installing (only the 1st time)
pkg_list <- c("tidyverse", "quarto", "rmarkdown", "palmerpenguins")
install.packages(pkg_list)
# Loading a package (at every session)
library ("tidyverse")